Skip to content

feat(maintainers): serve-time role resolution from a live maintainers table (+ label/review view perf fix)#192

Merged
entrius merged 1 commit into
testfrom
feat/serve-time-maintainer-resolution
Jun 17, 2026
Merged

feat(maintainers): serve-time role resolution from a live maintainers table (+ label/review view perf fix)#192
entrius merged 1 commit into
testfrom
feat/serve-time-maintainer-resolution

Conversation

@anderdc

@anderdc anderdc commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Why

author_association/reviewer_association are snapshotted at ingest and never refreshed, so role-based scoring (issue-bonus tier, PR-drop gate, maintainer review gate) reads stale roles. PR #190 patched this by rewriting the stored columns across 4 tables hourly — but that's the wrong layer: write-amplification, an hour staleness window, registered+installed-only coverage, and it corrupts the event-time meaning of the stored snapshot.

Separately, the per-row label/review association resolution went through contributor_repo_roles — a plain view re-derived per row (EXPLAIN ANALYZE: ~30ms each, a sort + DISTINCT ON over ~8k rows across pull_requests/issues/reviews/comments, 1–2× per PR). That's the ~70ms/row that times the validator out on high-volume miners.

Both collapse into one primitive: a live, indexed maintainers table.

What

  • New table maintainers (repo_full_name, github_id, login, association, refreshed_at), PK (repo_full_name, github_id). repo_full_name stored lowercased so every read joins m.repo_full_name = LOWER(<src>.repo_full_name) and still uses the PK index.
  • Replace feat(maintainer): reconcile stored author_association to live GitHub roles #190's column rewrite with MaintainerPopulateService: reuses the same fetcher (fetchRepoCollaborators + fetchOrgMembers) and safety rules (fetch-before-write, fail-closed-per-repo, skip-on-empty), but atomically upserts maintainers per repo (transactioned) instead of mutating 4 tables. Hourly @Cron + once on OnModuleInit.
  • Repoint off contributor_repo_roles onto maintainers: pr_labels_by_actor/issue_labels_by_actor (actor_association), pr_linked_issues (issue_author_association), and pr_review_summary (the maintainer CHANGES_REQUESTED filter). Indexed lookup instead of a per-row re-derivation → ~10–20× on the hot path.
  • Serve-time author resolution in miners.service.ts via COALESCE(maintainers.association, stored) on PR/issue authors — resolves onto the existing author_association field, no new payload field, no gittensor change. Safe because the validator only ever tests … in MAINTAINER_ASSOCIATIONS, so maintainers-only (present → role, absent → NULL/stored) is lossless.
  • Repoint /repos/:repo/maintainers to the table. Stored association columns marked as ingest snapshots; contributor_repo_roles left in place (no remaining hot-path consumers).

Validation

Loaded the full packages/db schema (incl. 11_maintainers + repointed views 21/22/24/25) into a throwaway Postgres and seeded a stale-snapshot case. All pass:

check result
PR author stored CONTRIBUTOR, in maintainers → served MEMBER
PR author not in maintainers → served CONTRIBUTOR (stored fallback)
pr_labels_by_actor actor: maintainer / non-maintainer MEMBER / NULL
pr_review_summary maintainer CR count (stale stored reviewer_association) 1 (resolved via table)
/maintainers mixed-case repo input resolves (case-insensitive join)

npm run build, lint, format:check clean in packages/das. No new deps (no lockfile change). No test files (team rule).


SQL migration checklist (prod) — run in this order

packages/db/*.sql only auto-runs on fresh Docker volumes. Prod has data, so run these manually against the mirror DB. Ordering matters: the table must be populated before the views are swapped, or views 24/25 emit NULL actor_association (every label actor reads as non-maintainer) during the gap.

1. Create the table (additive, safe anytime) — body of packages/db/11_maintainers.sql:

CREATE TABLE IF NOT EXISTS maintainers (
    repo_full_name  VARCHAR(255) NOT NULL,
    github_id       VARCHAR(255) NOT NULL,
    login           VARCHAR(255),
    association     VARCHAR(20)  NOT NULL,
    refreshed_at    TIMESTAMPTZ  NOT NULL DEFAULT NOW(),
    PRIMARY KEY (repo_full_name, github_id)
);

2. Deploy the app build. Safe with an empty table: serve-time COALESCE(m.association, stored) falls back to stored; views not yet swapped. OnModuleInit triggers an immediate populate.

3. Verify the table populated (seconds after boot; else await/trigger one hourly run):

SELECT repo_full_name, COUNT(*) FROM maintainers GROUP BY 1 ORDER BY 2;   -- each registered+installed repo >= 1 (owner)
SELECT * FROM maintainers WHERE repo_full_name='phase-rs/phase' AND github_id IN ('1388610','59729252');  -- expect MEMBER/COLLABORATOR

4. Cut over the views (instant CREATE OR REPLACE VIEW; paste the new bodies from this PR):

  • 21_view_pr_review_summary.sql
  • 22_view_pr_linked_issues.sql
  • 24_view_pr_labels_by_actor.sql
  • 25_view_issue_labels_by_actor.sql

5. Verify serving + perf:

EXPLAIN ANALYZE <windowed pulls query for 156195510>;   -- no contributor_repo_roles node; per-row cost collapsed
  • GET /repos/phase-rs/phase/maintainers returns the live set.
  • Row payload for github_ids 1388610 / 59729252 shows MEMBER/COLLABORATOR; SELECT author_association … still shows the unmutated snapshot.

6. Rollback (if needed): CREATE OR REPLACE VIEW the four views back to their contributor_repo_roles/stored bodies (originals are on origin/test). Table + app can stay (harmless).


Follow-ups

…s table

PR #190 reconciled author_association by rewriting the stored column across
4 tables hourly — write-amplification, an hour staleness window, and it
corrupts the event-time meaning of the snapshot. Separately, the per-row
label/review association resolution went through contributor_repo_roles, a
plain view re-derived per row (~30ms each: sort + DISTINCT ON over ~8k
rows), which times the validator out on high-volume miners.

Both collapse into one primitive: a live, indexed maintainers table.

- Add maintainers (repo_full_name, github_id, login, association, refreshed_at),
  PK (repo_full_name, github_id); repo_full_name stored lowercased so reads
  join as m.repo_full_name = LOWER(src.repo_full_name) and still hit the PK.
- Convert the reconcile service into MaintainerPopulateService: same fetcher
  (direct collaborators + org members), same safety rules (fetch-before-write,
  fail-closed-per-repo, skip-on-empty), but atomically upserts the maintainers
  table per repo instead of rewriting 4 tables. Runs hourly + once on boot.
- Repoint pr_labels_by_actor / issue_labels_by_actor actor_association and
  pr_linked_issues issue_author_association and pr_review_summary's maintainer
  filter off contributor_repo_roles onto maintainers (indexed lookup; ~10-20x).
- Resolve PR/issue author_association at serve time in miners.service via
  COALESCE(maintainers.association, stored) — no new payload field, so the
  validator needs no change.
- Repoint /repos/:repo/maintainers to the table. Mark stored association
  columns as ingest snapshots; leave contributor_repo_roles in place (no
  remaining hot-path consumers).
@entrius entrius merged commit d6907d9 into test Jun 17, 2026
2 checks passed
@entrius entrius deleted the feat/serve-time-maintainer-resolution branch June 17, 2026 00:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants